Which Coreference Evaluation Metric Do You Trust? A Proposal for a Link-based Entity Aware Metric
نویسندگان
چکیده
Interpretability and discriminative power are the two most basic requirements for an evaluation metric. In this paper, we report the mention identification effect in the B3, CEAF, and BLANC coreference evaluation metrics that makes it impossible to interpret their results properly. The only metric which is insensitive to this flaw is MUC, which, however, is known to be the least discriminative metric. It is a known fact that none of the current metrics are reliable. The common practice for ranking coreference resolvers is to use the average of three different metrics. However, one cannot expect to obtain a reliable score by averaging three unreliable metrics. We propose LEA, a Link-based Entity-Aware evaluation metric that is designed to overcome the shortcomings of the current evaluation metrics. LEA is available as branch LEA-scorer in the reference implementation of the official CoNLL scorer.
منابع مشابه
Maximum Metric Score Training for Coreference Resolution
A large body of prior research on coreference resolution recasts the problem as a two-class classification problem. However, standard supervised machine learning algorithms that minimize classification errors on the training instances do not always lead to maximizing the F-measure of the chosen evaluation metric for coreference resolution. In this paper, we propose a novel approach comprising t...
متن کاملOn Coreference Resolution Performance Metrics
The paper proposes a Constrained EntityAlignment F-Measure (CEAF) for evaluating coreference resolution. The metric is computed by aligning reference and system entities (or coreference chains) with the constraint that a system (reference) entity is aligned with at most one reference (system) entity. We show that the best alignment is a maximum bipartite matching problem which can be solved by ...
متن کاملCorpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملEvaluation of Coreference Resolution Tools for Polish from the Information Extraction Perspective
In this paper we discuss the performance of existing tools for coreference resolution for Polish from the perspective of information extraction tasks. We take into consideration the source of mentions, i.e., gold standard vs mentions recognized automatically. We evaluate three existing tools, i.e., IKAR, Ruler and Bartek on the KPWr corpus. We show that the widely used metrics for coreference e...
متن کاملA Metric for Evaluating Discourse Coherence based on Coreference Resolution
We propose a simple and effective metric for automatically evaluating discourse coherence of a text using the outputs of a coreference resolution model. According to the idea that a writer tends to appropriately utilise coreference relations when writing a coherent text, we introduce a metric of discourse coherence based on automatically identified coreference relations. We empirically evaluate...
متن کامل